Malware detection uswing external core characteristics

ABSTRACT

A malware scanner  2,  for malware such as computer viruses, worms, Trojans and the like, utilises the external call characteristics associated with known items of malware to identify the presence of malware within a computer file. Malware written in a high level language when compiled can take a variety of different forms as object code, but these different object code forms will usually share external call characteristics to a sufficient degree to allow the presence of such external call characteristics to properly and accurately generically identify different compiled variants of the source code malware.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates to the field of data processing systems. More particularly, this invention relates to the detection of malware, such as, for example, computer viruses, worms, Trojans and the like, within computer programs.

[0003] 2. Description of the Prior Art

[0004] It is known to provide malware detection systems that examine the code of a computer program to identify characteristics corresponding to known items of malware. These characteristics can be considered to be signatures of the viruses. Common approaches utilise binary search strings to look for these characteristics and checksums to detect the alteration of known computer programs.

[0005] The known techniques are not well suited to generically detect programs written in high level languages, such as C or VisualBasic. A problem with programs written in such high level languages is that if they are recompiled with other compilers or compiler options or the source code changed in a relatively minor manner, then the binary search strings needed to detect them are significantly altered. This alterations means that a signature developed to detect a particular variant of an item of malware written in a high level language will often fail to detect a minor variant thereof. As an example, if the source code for a Trojan is available on the Internet, then there often occur many dozens of variants of the Trojan which re-use some or all of the source code that has been made publicly available. Whilst the different items of malware so produced from the same source code have functional similarities, it is difficult with known techniques to develop a signature capable of detecting such variants.

[0006] The present invention addresses the problem of generically detecting groups of programs produced from the same source code.

SUMMARY OF THE INVENTION

[0007] Viewed from one aspect the present invention provides a computer program product for controlling a computer to detect a computer program containing malware, said computer program product comprising:

[0008] search code operable to search said computer program for external call instructions;

[0009] comparison code operable to compare said external call instructions within said computer program with at least one predetermined external call instruction characteristic determined from a plurality of external calls and corresponding to known malware; and

[0010] identification code operable to identify said computer program as containing malware if said external call instructions within said computer program match a predetermined external call instruction characteristic corresponding to known malware.

[0011] The present technique recognises that compiled program code reflects the source code from which it was produced in the sense that external calls performed by the program tend to appear in a characteristic order reflecting their appearance in the source code. Even a simple program to read and add the line of text to a file may make the following external calls: read the registry, open a file, read a file, write a file and close a file. Programs with a reasonable level of functionality written in contemporary programming forms, such as in the Win32 environment, typically perform thousands of external calls. The program carries out so many of these external calls that their quantity, order, location, distribution and/or other characteristics are a good way of identifying the programs concerned from their functionality without a requirement to fully emulate their action. Thus, by analysing the external calls present within a computer program a fingerprint for selectively identifying that computer program may be produced and malware written in a high level language can be detected in a variety of different compiled forms by detecting the common characteristics of the external calls made by those different compiled forms.

[0012] It will be appreciated that the external calls of which the characteristics are identified can take a variety of different forms. These external calls can be calls to an associated operating system, calls to a dynamic link library associated with the computer program and/or calls to a run-time library joined with the computer program by the compiler.

[0013] A characteristic of preferred embodiments is that the searching of the computer program for external calls will search the entire computer program as it may not be possible to determine in advance that the external calls, if present, will occur at some particular location within the computer program.

[0014] The predetermined external call instruction characteristics can take a wide variety of different forms. One particularly preferred type of characteristic is the identification of a predetermined set of characterising external calls within a computer program. It will be appreciated that these calls could take place a variety of different sequences, it is the presence of such a collection of calls together, possibly within predetermined relative positions of one another, which is characteristic of the malware to be identified.

[0015] The predetermined sets of characterising external calls can include logic within their definition of the characteristics being searched e.g. such a preferred embodiment can incorporate wildcard external call markers whereby any external call occurring at a particular point or within a particular range is considered as matching irrespective of its characteristics.

[0016] Further external call characteristics that can be examined are the presence of parameter values associated with external calls, e.g. within a predetermined relative location of particular external cores.

[0017] As a preliminary step in the analysis of a computer program which may contain malware, preferred embodiments of the present technique serve to analyse the computer program to determine identifying characteristics of external calls prior to searching the computer program for those external calls. As an example, the link information within an import table or the like is examined and the computer program searched to identify any associated run-time library in order that calls to links identified within the import table or locations within the run-time library are identified as external calls which will be subject to comparison with the predetermined external call characteristics.

[0018] It will be appreciated that the present technique could be used to identify a variety of different types of malware, such as, for example, computer viruses, worms and Trojans.

[0019] Viewed from another aspect the present invention provides a method of detecting a computer program containing malware, said method comprising the steps of:

[0020] searching said computer program for external call instructions;

[0021] comparing said external call instructions within said computer program with at least one predetermined external call instruction characteristic determined from a plurality of external calls and corresponding to known malware; and

[0022] identifying said computer program as containing malware if said external call instructions within said computer program match a predetermined external call instruction characteristic corresponding to known malware.

[0023] Viewed from a further aspect the present invention provides apparatus for detecting a computer program containing malware, said apparatus comprising:

[0024] search logic operable to search said computer program for external call instructions;

[0025] comparison logic operable to compare said external call instructions within said computer program with at least one predetermined external call instruction characteristic determined from a plurality of external calls and corresponding to known malware; and

[0026] identification logic operable to identify said computer program as containing malware if said external call instructions within said computer program match a predetermined external call instruction characteristic corresponding to known malware.

[0027] The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028]FIG. 1 schematically illustrates a malware scanner;

[0029]FIG. 2 schematically illustrates the generation of a compiled program from a high level language source code program;

[0030]FIG. 3 schematically illustrates one type of compiled computer program including external calls;

[0031]FIG. 4 schematically illustrates a malware characteristic in the form of a set of external calls with given relative positions.

[0032]FIG. 5 is a flow diagram schematically illustrating the technique of identifying malware using external call characteristics;

[0033]FIG. 6 schematically illustrates database entries within a database of malware external core characteristics; and

[0034]FIG. 7 is a diagram schematically illustrating the architecture of a general purpose computer of the type which may be used to implement the above techniques.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0035]FIG. 1 illustrates a malware scanner 2 including a scanner engine 4 and a malware definition database 6. File access requests generated by a program shell/loader (such as user inputs trying to start execution of an executable file) are sent to an operating system 8. A malware scanner interface 10 within the operating system 8 intercepts these file access requests and forwards them to the malware scanner 2. The malware scanner 2 uses the scanner engine 4 in conjunction with the malware definition database 6 to scan the file to which an access request has been made to determine whether or not it contains malware. A pass or fail result is passed back to the operating system 8. If the scan is passed, then the requested access to the file is permitted (e.g. the executable file is permitted to run). If a fail result is passed back, then malware found actions are triggered, such as blocking access to the file concerned, deleting the file concerned, quarantining the file concerned, generating alert messages and the like.

[0036]FIG. 2 illustrates the generation of a compiled computer program from a high level language source code program. A computer program writer will typically generate a computer program in a high level language, such as C, VisualBasic or the like. This source code program 12 is then supplied to a compiler 14 which compiles the source code program 12 in accordance with compile options set by the user to generate compiled code. The compiled code is passed to a linker 16 which links calls within the compiled computer program to appropriate routines within a library of routines 18 to generate the compiled and link program. The compiled and linked program may take the form of a compiled program 20 and an associated dynamic link library (DLL) 22 or a compiled program 24 joined to a runtime library (RTL) 26. These different forms may also be combined depending upon the compile options and the environment within which the computer program is to execute. The computer program 20, 24 is the reflection of the source code program 12 with characteristic external calls being made to library functions either within the DLL 22 or the RTL 26. It is the characteristics of these external calls which may be used to generically identify a compiled computer program 22, 24 produced from some common source code.

[0037]FIG. 3 illustrates a Win32 PE file using RTL external calls and API external calls. The WIN32 PE file 28 includes a computer program portion 30 including one or more external calls. These external calls may be via an import table 32 to yield an API external call to an operating system or DLL or alternatively may be to an RTL 34 joined to the computer program 30.

[0038] In accordance with the present technique, the import table 32 is examined in combination with the call instructions within the computer program 30 to determine the characteristics of an API external call which can then be matched against a database of external call characteristics known to correspond to malware. The Win32 PE file 28 is also initially examined to identify the boundaries of the RTL 34 (e.g. by utilising similar techniques whereby the characteristic external API calls made by the RTL 34 may be detected and used to identify the start and end of the RTL 34). Other known characteristics of the RTL could also be used to identify its boundaries. Once the boundaries of the RTL 34 are known, a call in the computer program to a location within the RTL 34 will be classified as an RTL external call which can form part of the characteristics of a known item of malware to be detected.

[0039]FIG. 4 schematically illustrates a malware characteristic within a computer program comprising a set of three external calls separated by characteristic relative distances. It will be appreciated that in practice a malware characteristic may be formed of a considerably greater number of external calls and that the relationships between the external calls can be expressed in a variety of different ways. A malware characteristic may include logic, such as fuzzy logic, whereby alternatives for different variants may be specified within the malware characteristic and ranges of parameters, parameter locations, call locations and the like may also be specified either separately or in combination.

[0040]FIG. 5 schematically illustrates a flow diagram of a system utilising the present technique. At step 36 the import table from a computer file to be scanned is read. At step 38 the computer file is searched for any embedded runtime library, possibly using a technique such as looking for characteristic API external calls that are made from within a runtime library. At step 40 the characteristics of calls which are to be classified as external calls are established. At step 42 the first predetermined external call characteristic of a known piece of malware is selected. Step 44 then searches the file to find external calls, a set of external calls, or some other external call characteristic that match the current malware external call characteristic being searched. Usually the entire computer file will need to be searched for external calls in order to be sure that the relevant malware external call characteristics are not within the computer program. However, to increase scanning speed the characteristics may be chosen from only some portion of the file (e.g. the beginning of the code section). At step 46 a test is made as to whether or not a match of the malware external call characteristics has been made. If a match has been made, then step 48 triggers malware found actions such as deleting the file concerned, quarantining the file concerned, denying access to the file concerned, generating alert messages and the like. If a match is not found, then processing proceeds to step 50 at which a determination is made as to whether the final malware external call characteristic within the database of malware external call characteristics has yet been reached. If the final malware external call characteristics has not yet been reached, then processing proceeds to step 52 at which the next malware external call characteristic is selected and processing returned to step 44. If the end of the database has been reached, then processing terminates.

[0041]FIG. 6 schematically illustrates two database entries within the malware external call characteristics database that forms part of the malware definition database 6 which is used by the scanner engine 4. In the examples illustrated a type of API call (for example “GetProcAddress” or “RegOpenKeyExA”), its relative location, an associated parameter value (for example 68 02 00 00 80 which is PUSH 80000002) and an associated relative parameter location may be specified as external core characteristic. It will be appreciated that all of these variables are potentially useful in different circumstances and the identification of some external calls will require a particular relative location to other items, or the presence of particular parameters, possibly within a certain range of location. It will also be appreciated that the database entries may include logic, such as fuzzy logic, embodying alternatives, such as AND or OR operations which are conducted between the specified call characteristics. The distances between calls can be measured as a useful identifying characteristic, e.g. in a file on disk or in the memory image of the file.

[0042]FIG. 7 schematically illustrates a general purpose computer 200 of the type that may be used to implement the above described techniques. The general purpose computer 200 includes a central processing unit 202, a random access memory 204, a read only memory 206, a network interface card 208, a hard disk drive 210, a display driver 212 and monitor 214 and a user input/output circuit 216 with a keyboard 218 and mouse 220 all connected via a common bus 222. In operation the central processing unit 202 will execute computer program instructions that may be stored in one or more of the random access memory 204, the read only memory 206 and the hard disk drive 210 or dynamically downloaded via the network interface card 208. The results of the processing performed may be displayed to a user via the display driver 212 and the monitor 214. User inputs for controlling the operation of the general purpose computer 200 may be received via the user input output circuit 216 from the keyboard 218 or the mouse 220. It will be appreciated that the computer program could be written in a variety of different computer languages. The computer program may be stored and distributed on a recording medium or dynamically downloaded to the general purpose computer 200. When operating under control of an appropriate computer program, the general purpose computer 200 can perform the above described techniques and can be considered to form an apparatus for performing the above described technique. The architecture of the general purpose computer 200 could vary considerably and FIG. 7 is only one example.

[0043] Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims. 

We claim:
 1. A computer program product for controlling a computer to detect a computer program containing malware, said computer program product comprising: search code operable to search said computer program for external call instructions; comparison code operable to compare said external call instructions within said computer program with at least one predetermined external call instruction characteristic determined from a plurality of external calls and corresponding to known malware; and identification code operable to identify said computer program as containing malware if said external call instructions within said computer program match a predetermined external call instruction characteristic corresponding to known malware.
 2. A computer program product as claimed in claim 1, wherein said external call instructions comprise one or more of: a call to an operating system; a call to a dynamic link library associated with said computer program; and a call to a run-time library joined with said computer program.
 3. A computer program product as claimed in claim 1, wherein said search code searches for all external calls within said computer program.
 4. A computer program product as claimed in claim 1, wherein said one or more predetermined external call instruction characteristics comprise predetermined sets of characterising external calls.
 5. A computer program product as claimed in claim 4, wherein said sets of characterising external calls have associated characterising relative position information specifying relative position requirements for matching external calls within said computer program.
 6. A computer program product as claimed in claim 4, wherein at least one of said predetermined sets of characterising external calls includes at least one wildcard external call marker which matches any external call within said computer program.
 7. A computer program product as claimed in claim 4, wherein at least one of said predetermined sets of characterising external calls includes at least one parameterised characterising external call associated with a characterising parameter value, said parameterised characterising external call matching with an external call within said computer program if said characterising parameter value also matches a corresponding parameter value associated with said external call within said computer program.
 8. A computer program product as claimed in claim 7, wherein said characterising parameter value has associated relative position information specifying a relative position to said parameterised characterising external call within which a matching parameter value must be found.
 9. A computer program product as claimed in claim 1, comprising analysis code operable to analyse said computer program to determine identifying characteristics of external calls prior to said step of searching.
 10. A computer program product as claimed in claim 9, wherein said analysis includes one or more of: analysing link information associated with said computer program; and analysing a location of said computer program within a file to identify a boundary between said computer program and a joined run-time library.
 11. A computer program product as claimed in claim 1, wherein said malware is one or more of: a computer virus; a worm; and a Trojan.
 12. A method of detecting a computer program containing malware, said method comprising the steps of: searching said computer program for external call instructions; comparing said external call instructions within said computer program with at least one predetermined external call instruction characteristic determined from a plurality of external calls and corresponding to known malware; and identifying said computer program as containing malware if said external call instructions within said computer program match a predetermined external call instruction characteristic corresponding to known malware.
 13. A method as claimed in claim 12, wherein said external call instructions comprise one or more of: a call to an operating system; a call to a dynamic link library associated with said computer program; and a call to a run-time library joined with said computer program.
 14. A method as claimed in claim 12, wherein said step of searching said computer program searches for all external calls within said computer program.
 15. A method as claimed in claim 12, wherein said one or more predetermined external call instruction characteristics comprise predetermined sets of characterising external calls.
 16. A method as claimed in claim 15, wherein said sets of characterising external calls have associated characterising relative position information specifying relative position requirements for matching external calls within said computer program.
 17. A method as claimed in claim 15, wherein at least one of said predetermined sets of characterising external calls includes at least one wildcard external call marker which matches any external call within said computer program.
 18. A method as claimed in claim 15, wherein at least one of said predetermined sets of characterising external calls includes at least one parameterised characterising external call associated with a characterising parameter value, said parameterised characterising external call matching with an external call within said computer program if said characterising parameter value also matches a corresponding parameter value associated with said external call within said computer program.
 19. A method as claimed in claim 18, wherein said characterising parameter value has associated relative position information specifying a relative position to said parameterised characterising external call within which a matching parameter value must be found.
 20. A method as claimed in claim 12, comprising analysing said computer program to determine identifying characteristics of external calls prior to said step of searching.
 21. A method as claimed in claim 20, wherein said analysing includes one or more of: analysing link information associated with said computer program; and analysing a location of said computer program within a file to identify a boundary between said computer program and a joined run-time library.
 22. A method as claimed in claim 12, wherein said malware is one or more of: a computer virus; a worm; and a Trojan.
 23. Apparatus for detecting a computer program containing malware, said apparatus comprising: search logic operable to search said computer program for external call instructions; comparison logic operable to compare said external call instructions within said computer program with at least one predetermined external call instruction characteristic determined from a plurality of external calls and corresponding to known malware; and identification logic operable to identify said computer program as containing malware if said external call instructions within said computer program match a predetermined external call instruction characteristic corresponding to known malware.
 24. A computer program product as claimed in claim 23, wherein said external call instructions comprise one or more of: a call to an operating system; a call to a dynamic link library associated with said computer program; and a call to a run-time library joined with said computer program.
 25. Apparatus as claimed in claim 23, wherein said search logic searches for all external calls within said computer program.
 26. Apparatus as claimed in claim 23, wherein said one or more predetermined external call instruction characteristics comprise predetermined sets of characterising external calls.
 27. Apparatus as claimed in claim 26, wherein said sets of characterising external calls have associated characterising relative position information specifying relative position requirements for matching external calls within said computer program.
 28. Apparatus as claimed in claim 26, wherein at least one of said predetermined sets of characterising external calls includes at least one wildcard external call marker which matches any external call within said computer program.
 29. Apparatus as claimed in claim 26, wherein at least one of said predetermined sets of characterising external calls includes at least one parameterised characterising external call associated with a characterising parameter value, said parameterised characterising external call matching with an external call within said computer program if said characterising parameter value also matches a corresponding parameter value associated with said external call within said computer program.
 30. Apparatus as claimed in claim 29, wherein said characterising parameter value has associated relative position information specifying a relative position to said parameterised characterising external call within which a matching parameter value must be found.
 31. Apparatus as claimed in claim 23, comprising analysis logic operable to analyse said computer program to determine identifying characteristics of external calls prior to said step of searching.
 32. Apparatus as claimed in claim 31, wherein said analysis includes one or more of: analysing link information associated with said computer program; and analysing a location of said computer program within a file to identify a boundary between said computer program and a joined run-time library.
 33. Apparatus as claimed in claim 23, wherein said malware is one or more of: a computer virus; a worm; and a Trojan. 